Formal Derivation of Distributed MapReduce
نویسندگان
چکیده
MapReduce is a powerful distributed data processing model that is currently adopted in a wide range of domains to efficiently handle large volumes of data, i.e., cope with the big data surge. In this paper, we propose an approach to formal derivation of the MapReduce framework. Our approach relies on stepwise refinement in Event-B and, in particular, the event refinement structure approach – a diagrammatic notation facilitating formal development. Our approach allows us to derive the system architecture in a systematic and well-structured way. The main principle of MapReduce is to parallelise processing of data by first mapping them to multiple processing nodes and then merging the results. To facilitate this, we formally define interdependencies between the map and reduce stages of MapReduce. This formalisation allows us to propose an alternative architectural solution that weakens blocking between the stages and, as a result, achieves a higher degree of parallelisation of MapReduce computations.
منابع مشابه
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing formal concept mining. Our method has its novelty in that we use a light-weight MapReduce runtime called Twister which is better suited to iterative algorithms...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملMerging Closed Pattern Sets in Distributed Multi-Relational Data
We consider the problem of mining closed patterns from multi-relational databases in a distributed environment. Given two local databases (horizontal partitions) and their sets of closed patterns (concepts), we generate the set of closed patterns in the global database by utilizing the merge (or subposition) operator, studied in the field of Formal Concept Analysis. Since the execution times of...
متن کاملTransactional Support in MapReduce for Speculative Parallelism
MapReduce has emerged as a popular programming model for large-scale distributed computing. Its framework enforces strict synchronization between successive map and reduce phases and limited data-sharing within a phase. Use of keyvalue based persistent storage with MapReduce presents intriguing opportunities and challenges. These challenges relate primarily to semantic inconsistencies arising f...
متن کاملScather: programming with multi-party computation and MapReduce
We present a prototype of a distributed computational infrastructure, an associated highlevel programming language, and an underlying formal framework that allow multiple parties to leverage their own cloud-based computational resources (capable of supporting MapReduce [27] operations) in concert with multi-party computation (MPC) to execute statistical analysis algorithms that have privacy-pre...
متن کامل